-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A start at opening the door to support for other languages. #103
Conversation
It would seem that we no longer need the link attribute, |
Maybe a different tack... When we are fetching a page we already have a title to slug lookup table in the sitemap. So, the only real issues are with page creation, and ensuring that the sitemap is up to date. As we only create pages either on the origin server, or in local storage, there is little reason why this change could not be a breaking one - and introduce a level of cooperation between the client and server when a page is initially created, or forked. |
Good point. When a sitemap is present and the link text matches a sitemap entry then this mapping should override that of asStub. We should define text matching to be a case independent match when we know enough about the character set to apply this transformation. We now have at least two ways to "paint our way" out of this corner. I don't see any problem with applying both when possible. |
I dislike the necessity of having the sitemap. Currently one of my sitemaps is 1MB, which isn't huge, but certainly an inconvenience on mobile devices. E.g. what if you wanted to use wikipedia.org as a part of federated wiki. Maybe we could let the servers handle resolving the names instead? For example when you request for "/世界" it would use an HTTP redirect you to "/world", also the normalized slug / url would always be present in the page itself, that way when a new link is created it would always refer to correct slug/title. |
Maybe you could try the paul90/utf-8-pagename branch and tell us how it works for you? I would be interested in how interoperable you could make your server. When provided a title you would have all the information that the client has. You would be responsible for case-insensitive matching for those alphabets that have case. You would also be expected to match conventional slugs when handling CORS requests from clients that don't have this modification. I still think the sitemap approach has merit and would allow a server to number pages as some wiki implementations want to do. If you choose to not offer a sitemap then this indirection will not be available to you. I haven't thought through this case carefully yet. It seems that employing an alternate slug algorithm has similar interoperability considerations. |
I took a look at the changes, and it doesn't seem to have the history copying - but of course it could be added. I simply cannot see whether that approach would look good on the titlebar. Also I'm using a different client. The handling on the server side is would be pretty trivial - i.e.
|
How different is your client? |
The current progress is here: https://github.com/raintreeinc/kbclient. Not complete and I haven't yet had time to completely fix some of the issues. I'm currently still migrating the first prototype to the new infrastructure. Essentially the client is a completely different codebase, but behavior is similar. Also, it has different goals than fedwiki, so it doesn't have (exactly) the same feature-set. Of course I try to keep it inter-operable with fedwiki protocol. The gist of the KnowledgeBase is "federated wiki between different people groups". Essentially this is an effort to merge usual help pages with engineering information and user provided wiki information. I liked the navigation style of Federated Wiki client, so I started from there. It provides a very fast way of navigation complex information. The Federation part gives really good boundaries for managing different wiki-s and their security, but at the same time allow one way access. |
Another approach might be for the client to try a unicode-friendly slug first and if that doesn't work, try the original slug when different. Can you provide us a table of sample titles and how they would convert to the two slug formats in question? |
These should give the general idea https://github.com/egonelbre/fedwiki/blob/master/slug_test.go#L9. kbclient contains the slugification code for javascript as well. |
I'm worried with my suggested slugification as well... Mainly, maybe there is some letters that aren't present in the current unicode tables, which means that in future you need to start upgrading those tables for clients, otherwise it won't work for all cases. With Server based approach you don't have that problem, because the server knows what pages it has and can appropriately discard those pages. Of course there is a problem with the Server approach - if you update your naming/slugification scheme links to your site will break. |
@egonelbre I thought your slugs looked nice and handled some edge cases like duplicate or leading/trailing dashes that were meant for the original but didn't get implemented in the rush to get the project started. I notice that you spell out some symbols. Is that to avoid specific meaning in a url while avoiding the %xx notation? I also notice that you pass the slash (/) with some reference to hierarchical names. Are they a requirement of your application? Or is this something you like from other wiki? I have avoided namespace concepts hoping that federation would be sufficient and more "natural". |
The reason I am replacing symbols is to avoid problems with URLs. Essentially I had titles such as I initially was keeping the slash because I had multiple federation end-points under a single address. E.g. /help/, /wiki/, /dev/ of course now I'm changing it to properly distributed. The main use-case I can see for paths is generated services and converted pages:
One case I have is that we have multiple application version and each one has separate version of help pages - so it would be nice to serve them from under help.raintreeinc.com/500, help.raintreeinc.com/400... instead of help500.raintreeinc.com (or something similar)... that approach also reduces the DNS maintanance overhead. Essentially regarding |
Forgot one of my use cases... providing a citations listing, e.g. |
Somewhat off topic, regarding slugs and internationalization, but related Elsewhere, WardCunningham/Smallest-Federated-Wiki#412, there are some initial thoughts on changing the story serialization. This is somewhat connected insofar as it might open a few possibilities. While I'm not sure that I would go quite the same route having thought about this for nearly a year I would probably use something more like Not quite sure how services like I'm really not sure if I've seen a slugify routine that I am really happy with. They all seem to have some shortcomings, the best remove accents rather than dropping letters as well as replacing ligatures. I wonder if it would not be better to simply URL encode the page title - most modern browsers will do their thing and make it readable. It is then up to the server how it maps this onto the name it will use for storage. The big downside is that the slug is used to refer to the page in many places, so this would be a big change. |
The I'm currently using Although I also tried the I'm not exactly sure what do you mean by that downside of "slug is used to refer to the page in many places". Could you elaborate? |
Just musing that if we moved to using url encoding for making server requests, and doing the slug generation on the server, that there are knock-on effects in the client. The more I look at this, the more I get the feeling that all the slugify routines are a relic from the past, before any internationalization of the web. The different national versions of wikipedia look to use a mix of url encoding and a minimal set of character replacement (replacing |
I'm closing this, as there has been no activity in over a year. |
I am creating a wiki farm for an international project related to permaculture. The ability to title pages with non-Latin characters is an absolute must. Absolutely everything else about Federated Wiki is squarely aligned with this project's needs. Naively, just jumping into this thread 6 years after it was closed, I like the approach of imitating Wikipedia's URL-encoding-based solution on the front end and letting the back end worry about how to persist it. Punycode would be another possible encoding that might offer a path for compatibility. I am motivated to fix this for my community and would really love to do so in a way that fixes it for the world and fits with this project. |
Hi @replaid ! Glad you are joining the conversation here. To ease the conversation and development process, though, I would suggest to open up a new issue, preferably in the overarching project https://github.com/fedwiki/wiki/issues (as GitHub discussions are disabled). I am myself responsible for taking care of the legacy from Silke Helfrich and her fabulous work with David Bollier on the pattern language of commoning, published in Federated Wikis at: There are a few more language editions coming up next (French, Spanish, Greek, and what have you) We are entering uncharted land here, so it will be on us to provide the Wiki community not only with requirements, but also with good practices, possible workarounds and eventually original development. The subjects of translation, and more broadly, internationalisation (i18n) will bring much joy and nuts to crack, so we ought to continue to work slowly, but surely on this. I went ahead and filed a new issue for this subject: Please feel free to continue our conversation there. |
@replaid can you give us some examples of "pages titled with non-Latin characters an absolute must." Are you expecting collaboration to work across languages? How? For the sake of this conversation, perhaps you can give us English translations too. |
The author below is compiling permaculture articles in Russian and we anticipate collaboration between speakers of Russian, English, and Spanish in the near term, possibly other languages as well (the next one on the list would be right-to-left…). This would be titled "Guilds" in English. Here it is with an English-language translation too: http://john.permakultura.wiki/view/welcome-visitors/miron.permakultura.wiki/gildii/view/guilds The limitation of Latin letters for the page title is the only roadblock I am aware of to having a normal wiki workflow and interaction. But we anticipate a fairly broad-based selection, i.e. not all academics—I believe it will be pretty important to be able to support a fully native-language workflow. |
I would want to name this page in its own script and cite it the same way: Russian is a phonetic alphabet, and can be very effectively transliterated into Latin letters (in this case I have been looking at Punycode, Nameprep, and stringprep and I think it has potential—the main thing I'm aware of that I need to think through is how it would interact with wiki when searching for pages with the use of substrings. Otherwise I think it's very close to our needs with a similar history of starting with a Latin-only system and extending into Unicode with security concerns, etc. |
Would an author want to identify Gil'dii as a synonym for Гильдии? |
Internationalisation is not just an issue for the client. To ensure that all aspect are covered it will be best to discuss this subject over in fedwiki/wiki#139 , rather than in this closed PR that was just a start at exploring one small corner of this issue. |
In terms of its value to the user experience, this would be like asking an English speaker to please type I've created an issue specifically for page titles at fedwiki/wiki#140. I'm optimistic that this can be done in a compatible way without a huge lift. |
Just a start for review
a link to Åsnes generates a request URL:
http://localhost:3000/snes.json?random=14cc3a4d&title=%C3%85snes
title provides something that a server might want to work with to find the page's data.